Differentially-Private Logistic Regression for Detecting Multiple-SNP Association in GWAS Databases
نویسندگان
چکیده
Following the publication of an attack on genome-wide association studies (GWAS) data proposed by Homer et al., considerable attention has been given to developing methods for releasing GWAS data in a privacy-preserving way. Here, we develop an end-to-end differentially private method for solving regression problems with convex penalty functions and selecting the penalty parameters by cross-validation. In particular, we focus on penalized logistic regression with elastic-net regularization, a method widely used to in GWAS analyses to identify diseasecausing genes. We show how a differentially private procedure for penalized logistic regression with elastic-net regularization can be applied to the analysis of GWAS data and evaluate our method’s performance.
منابع مشابه
Privacy-Preserving Data Sharing for Genome-Wide Association Studies
Traditional statistical methods for confidentiality protection of statistical databases do not scale well to deal with GWAS (genome-wide association studies) databases especially in terms of guarantees regarding protection from linkage to external information. The more recent concept of differential privacy, introduced by the cryptographic community, is an approach which provides a rigorous def...
متن کاملComputational Approaches To Anti-Toxin Therapies And Biomarker Identification
Statistical Analysis: We utilized four publically available case/control genome wide association studies (GWAS) from dbGAP (access request # 1961) across multiple cancer types (including breast, melanoma. lung and prostate cancers)(14, 523, 524, 526-528) to determine if SNPs or haplotypes constructed from SNPs in our genes of interest are associated with a disease phenotype. Additionally, we de...
متن کاملComparison of dimension reduction-based logistic regression models for case-control genome-wide association study: principal components analysis vs. partial least squares
With recent advances in biotechnology, genome-wide association study (GWAS) has been widely used to identify genetic variants that underlie human complex diseases and traits. In case-control GWAS, typical statistical strategy is traditional logistical regression (LR) based on single-locus analysis. However, such a single-locus analysis leads to the well-known multiplicity problem, with a risk o...
متن کاملMETAINTER: meta-analysis of multiple regression models in genome-wide association studies
MOTIVATION Meta-analysis of summary statistics is an essential approach to guarantee the success of genome-wide association studies (GWAS). Application of the fixed or random effects model to single-marker association tests is a standard practice. More complex methods of meta-analysis involving multiple parameters have not been used frequently, a gap that could be explained by the lack of a res...
متن کاملScalable privacy-preserving data sharing methodology for genome-wide association studies
The protection of privacy of individual-level information in genome-wide association study (GWAS) databases has been a major concern of researchers following the publication of "an attack" on GWAS data by Homer et al. (2008). Traditional statistical methods for confidentiality and privacy protection of statistical databases do not scale well to deal with GWAS data, especially in terms of guaran...
متن کامل